excess loss
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Sweden > Uppsala County > Uppsala (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- (2 more...)
Directional Optimization Asymmetry in Transformers: A Synthetic Stress Test
Transformers are theoretically reversal-invariant: their function class does not prefer left-to-right over right-to-left mappings. Yet empirical studies on natural language repeatedly report a "reversal curse," and recent work on temporal asymmetry in LLMs suggests that real-world corpora carry their own arrow of time. This leaves an unresolved question: do directional failures stem from linguistic statistics, or from the architecture itself? We cut through this ambiguity with a fully synthetic, entropy-controlled benchmark designed as a clean-room stress test for directional learning. Using random string mappings with tunable branching factor K, we construct forward tasks with zero conditional entropy and inverse tasks with analytically determined entropy floors. Excess loss above these floors reveals that even scratch-trained GPT-2 models exhibit a strong, reproducible directional optimization gap (e.g., 1.16 nats at K=5), far larger than that of an MLP trained on the same data. Pre-trained initializations shift optimization behavior but do not eliminate this gap, while LoRA encounters a sharp capacity wall on high-entropy inverse mappings. Together, these results isolate a minimal, semantics-free signature of directional friction intrinsic to causal Transformer training-one that persists even when linguistic priors, token frequencies, and corpus-level temporal asymmetries are removed. Our benchmark provides a controlled instrument for dissecting directional biases in modern sequence models and motivates deeper mechanistic study of why inversion remains fundamentally harder for Transformers.
- Europe > Netherlands > South Holland > Leiden (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > United States > Wisconsin (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)